Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello
نویسندگان
چکیده
We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: i) generalization performance or expected utility, ii) average results against a hand-crafted heuristic and iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance measure characterizes player’s performance in the context of opponents of various strength. The multi-criteria analysis reveals that although the generalization performance of players produced by the two algorithms is similar, TDL is much better at playing against the strong opponents, while CEL copes better against the weak ones. We also find out that TDL produces less diverse strategies than CEL. Our results confirm the usefulness of performance profiles as a tool for comparison of learning algorithms for games.
منابع مشابه
Neuro-Evolution Through Augmenting Topologies Applied To Evolving Neural Networks To Play Othello
Many different approaches to game playing have been suggested including alpha-beta search, temporal difference learning, genetic algorithms, and coevolution. Here, a powerful new algorithm for neuroevolution, Neuro-Evolution for Augmenting Topologies (NEAT), is adapted to the game playing domain. Evolution and coevolution were used to try and develop neural networks capable of defeating an alph...
متن کاملLearning to Play Othello with N -Tuple Systems
This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously developed weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games o...
متن کاملThe performance profile: A multi-criteria performance evaluation method for test-based problems
In test-based problems, solutions produced by search algorithms are typically assessed using average outcomes of interactions with multiple tests. This aggregation leads to information loss, which can render different solutions apparently indifferent and hinder comparison of search algorithms. In this paper we introduce performance profile, a generic, domainindependent, multi-criteria performan...
متن کاملEffect of look-ahead search depth in learning position evaluation functions for Othello using epsilon-greedy exploration
This paper studies the effect of varying the depth of look-ahead for heuristic search in temporal difference (TD) learning and game playing. The acquisition position evaluation functions for the game of Othello is studied. The paper provides important insights into the strengths and weaknesses of using different search depths during learning when 2-greedy exploration is applied. The main findin...
متن کاملThe Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...
متن کامل